AITopics | Pemba

Collaborating Authors

Pemba

Expanding FLORES+ Benchmark for more Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation

Ali, Felermino D. M. Antonio, Cardoso, Henrique Lopes, Sousa-Silva, Rui

arXiv.org Artificial IntelligenceAug-21-2024

As part of the Open Language Data Initiative shared tasks, we have expanded the FLORES+ evaluation set to include Emakhuwa, a low-resource language widely spoken in Mozambique. We translated the dev and devtest sets from Portuguese into Emakhuwa, and we detail the translation process and quality assurance measures used. Our methodology involved various quality checks, including post-editing and adequacy assessments. The resulting datasets consist of multiple reference sentences for each source. We present baseline results from training a Neural Machine Translation system and fine-tuning existing multilingual translation models. Our findings suggest that spelling inconsistencies remain a challenge in Emakhuwa. Additionally, the baseline models underperformed on this evaluation set, underscoring the necessity for further research to enhance machine translation quality for Emakhuwa. The data is publicly available at https://huggingface.co/datasets/LIACC/Emakhuwa-FLORES.

artificial intelligence, machine translation, natural language, (15 more...)

arXiv.org Artificial Intelligence

2408.11457

Country:

Europe > Portugal > Porto > Porto (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsy Narratives

Fan, Shuxian, Visokay, Adam, Hoffman, Kentaro, Salerno, Stephen, Liu, Li, Leek, Jeffrey T., McCormick, Tyler H.

arXiv.org Machine LearningApr-2-2024

In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent's COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii) performing inference with predicted CODs (e.g. modeling the breakdown of causes by demographic factors using a sample of deaths). In this paper, we develop a method for valid inference using outcomes (in our case COD) predicted from free-form text using state-of-the-art NLP techniques. This method, which we call multiPPI++, extends recent work in "prediction-powered inference" to multinomial classification. We leverage a suite of NLP techniques for COD prediction and, through empirical analysis of VA data, demonstrate the effectiveness of our approach in handling transportability issues. multiPPI++ recovers ground truth estimates, regardless of which NLP model produced predictions and regardless of whether they were produced by a more accurate predictor like GPT-4-32k or a less accurate predictor like KNN. Our findings demonstrate the practical importance of inference correction for public health decision-making and suggests that if inference tasks are the end goal, having a small amount of contextually relevant, high quality labeled data is essential regardless of the NLP algorithm.

correction 0, inference, log odds ratio, (13 more...)

arXiv.org Machine Learning

2404.02438

Country:

North America > United States > Washington > King County > Seattle (0.14)
Africa > Mozambique > Cabo Delgado Province > Pemba (0.07)
Asia > India > Uttar Pradesh (0.05)
(13 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Public Health (1.00)
Health & Medicine > Epidemiology (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Episode 42: How Far Can We Take AI?

#artificialintelligenceDec-7-2021, 03:40:07 GMT

On this episode of the eeDesignIt Podcast, we're joined by Dhonam Pemba to explore artificial intelligence (AI) and his new company KidX AI. Dhonam is a neural engineer by PhD, a former rocket scientist and a serial AI entrepreneur. He was CTO of the exited company, Kadho which was acquired by Roybi for its Voice AI technology. At Kadho Sports he was their Chief Scientist which had clients in MLB, USA Volleyball, NFL, NHL, NBA, and NCAA. His latest company, KidX, is in the AI edtech space, where he has built NLP and Voice assessment to serve China's leading robotics company with 4M users.

episode 42, take ai

#artificialintelligence

Country:

North America > United States (0.31)
Asia > China (0.31)
Africa > Mozambique > Cabo Delgado Province > Pemba (0.31)

Industry: Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Mobile (0.67)

Add feedback

Interview with AI Specialist Dhonam Pemba

#artificialintelligenceOct-8-2021, 01:35:13 GMT

For our latest expert interview on our blog, we've welcomed Dhonam Pemba to share his thoughts on the topic of artificial intelligence (AI) and his journey behind founding KidX AI. Dhonam is a neural engineer by PhD, a former rocket scientist and a serial AI entrepreneur with one exit. He was CTO of the exited company, Kadho which was acquired by Roybi for its Voice AI technology. At Kadho Sports he was their Chief Scientist which had clients in MLB, USA Volleyball, NFL, NHL, NBA, and NCAA. His latest company, KidX, is in the AI edtech space, where he has built NLP and Voice assessment to serve China's leading robotics company with 4M users.

learning, machine learning, responsibility, (12 more...)

#artificialintelligence

Country:

Africa > Mozambique > Cabo Delgado Province > Pemba (0.60)
Asia > China (0.24)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Personal > Interview (1.00)

Industry:

Information Technology (0.69)
Leisure & Entertainment > Sports (0.54)
Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback

AI Edtech Entrepreneur's Journey from Neuroscience to Toys

#artificialintelligenceJun-23-2021, 21:44:31 GMT

Dr. Dhonam Pemba is the CEO and Co-Founder of KidX, he is a neural engineer by education, a former rocket scientist by work, and AI entrepeneur by entrepeneurship. He received his Biomedical Engineering undergraduate degree from Johns Hopkins University, and hi PhD from the University of California, Irvine also in BME, but worked on neural interface for his thesis. Can you me about the NASA JPL project and how it was related to your PhD work? My PhD work was building micro implantable neural implants. Very similar to the work that Elon Musks's company Neuralink is now doing.

interaction, interactive video, platform, (11 more...)

#artificialintelligence

Country:

North America > United States > California > Orange County > Irvine (0.25)
Africa > Mozambique > Cabo Delgado Province > Pemba (0.25)
Asia > China (0.07)
North America > United States > Virginia (0.05)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.87)
Government > Regional Government > North America Government > United States Government (0.55)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (0.52)

Add feedback

Google Earth relaunches today with stunning detail

Daily Mail - Science & techApr-18-2017, 12:20:10 GMT

Google has today launched a re-imagined version of its free Earth mapping service, weaving in storytelling and artificial intelligence. The new programme lets people get a close-up look of the planet from the comfort of their computers, smartphones or tablets. The new-look Google Earth enables its users to learn about far-flung corners of the globe under the guidance of scientists from Nasa and prestigious research institutions. Google Earth's new start-up screen offers a global view of the Earth. 'This is our gift to the world,' Google Earth director Rebecca Moore said.

artificial intelligence, software, stunning detail, (10 more...)

Daily Mail - Science & tech

Country:

North America > United States > New York (0.16)
Europe > United Kingdom > England > Greater London > London (0.16)
South America (0.05)
(5 more...)

Technology:

Information Technology > Artificial Intelligence (0.96)
Information Technology > Communications > Mobile (0.56)

Add feedback

Normalized Information Distance

Vitanyi, Paul M. B., Balbach, Frank J., Cilibrasi, Rudi L., Li, Ming

arXiv.org Artificial IntelligenceSep-15-2008

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

0809.2553

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Washington > King County > Bellevue (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(8 more...)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback